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abstract 


An  information  system  is  defined  as  a  chain  of  information  services: 
inquiring  --  data- storing  --  encoding  --  transmitting  --  decoding  --  deciding. 
Each  is  a  transformer  represented,  in  general,  by  a  stochastic  matrix  and  a 
cost  function.  The  inputs  of  "inquiring"  are  the  benefit-relevant  events 
(possibly  statistical  parameters).  Actions  are  outputs  of  "deciding." 

Together,  actions  and  events  determine  the  benefits.  Other  outputs  of  a 
service  are:  (a)  inputs  into  the  successive  service,  and  (b)  contributions 
to  the  cost  of  acquiring  and  operating  the  information  system. 

The  decision  theory  of  economists  and  statisticians  has  usually  neglected 
the  subsequence  "data- storing  —  encoding  —  transmitting  --  decoding." 
Communication  engineers,  on  the  other  hand,  have  neglected  the  inquiring  and 
deciding  services  and  have  usually  equated  benefit  with  the  non-occurrence 
of  error  in  the  communication  of  data.'  With  data  pre-stored,  long  sequences 
of  messages  can  be  conmunicated  without  prohibitive  delays;  and  useful 
asymptotic  properties  of  the  "information  amount  transmitted"  and  the 
"channel  capacity"  follow.  These  quantities  are  relevant  to  the  communication 
cost  but  neither  to  the  cost  nor  the  benefit  of  inquiring  and  deciding. 

Suppose  the  utility  to  the  "manager"  (the  "organizer,"  the  "meta-decider") 
is  known  to  be  additive  in  benefit  and  cost  (both  appropriately  scaled),  and 
his  "prior*' probability  of  events  is  known.  Then,  and  only  then,  the 
("efficient")  subset  of  all  feasible  information  systems  for  which  the  pair 
"expected  benefit,  expected  cost"  is  not  dominated  by  that  of  any  other  system, 
will  contain  all  optimal  systems.  An  optimal  system  can  then  be  determined 
by  a  manager  compelled  to  search  for,  and  to  apply,  his  "scaling  functions" 
expressing  benefits  and  costs  in  the  same  units. 

Correspondingly,  pure  communication  theory  has  assumed,  in  effect,  utility 
to  be  additive  in  the  following  criteria  (all  undesirable,  costly,  or  delay- 
producing):  occurrence  of  communication  error;  length  of  code  word;  size  of 
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code;  and  channel  capacity.  However,  for  the  efficient  choice  of  the  total 
chain  of  information  services,  factors  determining  the  cost  of  inquiring 
(e.g.,  sample  site)  and  of  deciding  (e.g.,  computer  memory)  must  also  be 
considered,  each  properly  transformed  to  become  an  additive  component  of 
utility;  and  an  (additive)  overall  benefit  must  replace  the  criterion  of 
"communication  error." 
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1.  INTRODUCTION 

1.1.  This  is  an  attempt  'to  clear  up  important  misunderstandings  and  to 
achieve  conceptual  unity  between,  on  the  one  hand,  the  economists  and 
statisticians  concerned  with  efficient  decision  and  organization  and,  on  the 
other,  the  communication  engineers  who  have  created  vhat  has  come  to  be  called 
information  theory.  Related  thought  of  workers  in  the  logic  and  psychology  of 
language  and  of  problem-solving  should  also  find  its  place  in  the  common 
conceptual  framework. 

1.2.  The  manager  buys  instruments  or  hires  services.  The  distinction  is 
not  relevant  for  the  general  statement  of  our  problem.  We  shall  therefore  speak, 
for  brevity,  of  services  only,  with  the  understanding  that  in  any  particular 
application  the  size  of  a  stock  of  instruments  will  be  carefully  distinguished 
from  the  number  of  machine-hours  (or  man-hours)  of  a  service. 

1.3.  The  term  information  system  will  denote  the  sequence  of  information 
services,  viz.,  the  services  of  inquiring,  communicating,  and  deciding,  in  that 
order.  More  precisely,  communication  is  itself  a  sequence  of  encoaing, 
transmitting,  and  deciding.  There  is  also  another  component  of  the  sequence, 
called  storing.  It  can  be  intermediate  between  any  two  consecutive  information 
services  and  in  particular  between  inquiring  and  encoding. 

Each  information  service  can  be  regarded  as  a  transformer  of  its  character¬ 
istic  inputs  into  outputs.  On  Figures  1,,  2.,  3»,  transformers  are  boxes; 
variables  (sets  of  values  of  inputs  or  outputs)  are  circles.  Variables  are 
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denoted  by  lower  case  Latin  letters.  Transformers  (functions)  are  denoted  by 
Greek  letters,  with  the  exception  of  encoding  and  decoding. 

2.  INQUIRIN''-  ,'ND  DECIDING,  AT  CONSTANT  COST 

2.1.  Figure  2,  "Inquiring,  Communicating,  Deciding,"  is  more  complete 
than  either  Figure  1,  "Inquiring  and  Deciding,"  or  Figure  3,  "Communication  only-" 
But  it  will  be  convenient  to  start  with  Figure  1,  which  omits  the  communication 
aspect,  and  considers  only  two  information  services,  "inquiring"  and  "deciding." 
We  shall  also  disregard,  for  a  moment,  the  symbols  k,  /c^,  kq,  all  referring 
to  cost.  We  consider  the  information  system  consisting  of  two  consecutive 
transformers,  A  (inquiring)  and  o  (deciding).  In  the  language  of  decision 
theory,  A  is  also  called  "experiment,"  or  (in  application  to  medicine), 
"diagnostic  tool."  The  transformer  o  is  called  "rule  of  action"  or  "decision 
rule."  The  inputs  of  the  "inquiring"  box  are  "events"  x  and  its  outputs  are 
"data"  y  (also  called  "observations").  The  inputs  of  the  "deciding"  box  are 
the  data  y  and  its  outputs  are  actions  a.  Thus 

A(x)  =  y,  a(y)  =  a;  therefore 
a  *c(A(x)),  or  simply  a  =  Aa(x), 

with  tne  understanding  that  the  last  transformation  is  entered  last.  Thus,  the 
information  system  To  has  transformed  an  event  x  into  an  action  a.  The 
manager  must  choose  from  some  available  (feasible)  set  of  such  pairs  To,  one 
that  is  "efficient."  Still  disregarding  for  a  while  the  costs  associated  with 
each  information  system,  we  define  the  transformer  "criterion  function"  (or, 
better,  gross  payoff,  or  benefit, function)  y,  which  transforms  the  input  pair 
(x,  a)  into  the  output  g,  the  "gross  payoff,"  or  benefit.  It  depends  on  the 
chosen  Ta  thus : 
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g  -  7(x>  a)  -  7(x,  >o(x)). 

2.2.  Events  x  are,  in  general,  random  variables,  distributed  vith 

("prior")  probabilities  moreover,  the  transformer  X,  "inquiring,"  and 

possibly  also  the  transformer  a,  "deciding,"  are  "noisy,"  in  a  sense  to  be 
explained  presently.  As  a  result,  gross  payoff  is  also  a  random  variable.  By 
definition,  it  measures  the  desirability  to  the  manager  of  the  outcomes  of  the 
actions,  in  the  following  sense:  if  costs  would  not  depend  on  the  chosen 
information  system  Xa,  he  would  prefer  the  system  yielding  a  higher  expected 
payoff  to  one  yielding  a  lower  expected  payoff;  the  word  "expected"  meaning  the 
average  of  payoffs  weighted  by  their  respective  probabilities.  These  probabili¬ 
ties  depend  on  the  "prior"  probabilities  «  of  events  x,  and  on  the 

condi tionsl  probabilities  characterizing  the  inquiry  X,  and  possibly  the 
"deciding"  transformer  a,  as  follows. 

2.3.  Should  the  inquiring  be  free  of  errors,  "noiseless,"  the  symbol  X 
stands  fOr  an  ordinary  function,  associating  every  event  x  with  exactly  one 
observation  y.  In  general,  it  will  not  be  a  one-to-one  mapping  ("perfect 
inquiry"  is  a  special,  limiting  case);  rather,  it  will  be  a  many-to-one  mapping: 
two  events,  x  and  x',  may  yield,  fbr  some  action  a,  two  distinct  payoffs, 

g(x,  a)  /  g(x',  a), 

but  the  inquiring  service  may  not  distinguish  between  x  and  x'  (it  will  be 
"coarser"  than  a  perfect  inquiry  service): 

X(x)  =  X(x' ). 

2.U.  However,  a  still  more  general  case  is  a  many- to -many  mapping.  Then  to 
each  x  ■  xQ  corresponds  not  one  observation  (datum)  y,  but  an  array  of 
conditional  probabilities  p(y|xg),  sunning  up  to  1  over  all  observations  y. 


The  inquiry  X  is  then  represented  by  a  (Markov)  matrix  whose  rows  are  such 
arrays  of  conditional  probabilities  of  observations,  given  the  events.  We  shall 
write  X  *  [Xxy]  where  X^  =  p(y|x);  X  is  called  the  likelihood  matrix. 

Thus,  inquiry  X  is,  in  general,  a  "stochastic  transformation."  When  we  write 

y  =  X(x), 

we  shall  mean,  in  general,  that  the  conditional  probabilities  p(y|x)  =  X 

xy 

are  elements  of  the  matrix  X.  In  the  noiseless  case,  each  rov  of  this  matrix 
contains  one  element  1  (and  the  rest  are  therefore  zeros);  in  the  "perfect" 
case,  X  is  an  identity  matrix,  provided  the  columns  are  labelled  appropriately. 

2.5.  We  could  make  analogous  statements  about  the  "deciding"  service,  o. 

A  decision  service  can  be  "perfect"  (perfectly  flexible),  or  "coarse  but  noise¬ 
less,"  or  "noisy, "  depending  on  whether  a  represents  a  one-to-one,  many-to-one, 
or  many-to-many  mapping  of  data  into  actions.  Intuitively,  the  reason  why  a 
noisy  inquiry  is  chosen  is  that  noiseless  (and,  even  more  so,  perfect)  inquiries 
are  costly,  or  are  not  available  at  all.  Similarly  a  decider  (especially  if 
we  think  not  of  our  ideal  manager  but  of  an  employee  or  a  machine  at  his  service) 
may  use  some  non-sophisticated,  coarse  rules,  or  may  make  errors  from  time  to 
time,  and  yet  be  worth  hiring  if  he  is  sufficiently  cheap.  But  before  introduc¬ 
ing  cost  of  the  inquiring  and  deciding  explicitly,  note  that  the  expression  To: 
can  be  conveniently  read  as  the  product  of  Markov  matrices,  since,  if  a  =  >a(x), 
then  indeed  the  conditional  probability  p(a|x)  is  equal  to  2^  XJcy  o^,  and 
this  is  the  (x,a)-th  element  of  the  product  matrix  To.  The  expected  gross 
payoff  G,  say,  can  then  be  written  as 

£(s)  =  £  S  Z  *xXxyPya7(x,  a)  =  G(x,  G,  *,  7), 

where  the  semicolon  separates  the  entities  to  be  chosen  ("controlled")  by  the 
manager,  from  those  given  to  him  ("non-controlled").  If  costs  did  not  depend 
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on  his  choice  of  the  information  system  >a,  he  would  maximize  the  expected 
gross  payoff  G  over  the  set  of  available  information  systems  {Xa},  say. 

The  chosen  system  (or  the  set  of  equally  good  systems,  none  worse  than  any  other 
available  one)  would  depend  on  the  givens,  i.e.,  on  the  prior  probability 
function  x  and  the  gross  payoff  function  7. 

2.6.  We  can  also  rewrite  the  expected  gross  payoff  more  explicitly,  in 
terms  of  the  elements  of  the  Markov  matrices  involved,  thus: 

E(g)  =  x^a  **VW(X'  a); 

since  the  variables  x,  y,  a  are  "killed"  by  the  triple  summation  over  all 
their  values,  G  is  again  seen  to  depend  only  on  the  choice  of  the  information 
system  >a  and  on  the  givens  x,  7.  At  the  bottom  of  Figure  1,  a  simpler 
expression  for  the  expected  gross  payoff  is  given,  valid  if  the  deciding  service, 
a,  is  noiseless  (as  we  may  assume  fbr  simplicity  in  what  follows). 

3.  INTRODUCING  COST  VARIABLE 

3.1.  Now  to  the  costs.  The  cost  of  inquiry  depends  on  the  nature  of  the 
inquiry  (e.g.,  noisy  inquiry  is  cheaper;  a  small  sample  is  cheaper  than  a 
large  one)  but  also  on  the  particular  event  that  happens  to  occur.  Thus  the 
cost  is  a  function  of  x, 

^(x),  say: 

a  random  variable.  (To  take  sampling  again  as  an  example,  the  cost  of  a  survey 
of  housewives'  attitudes  to  a  product  will  depend  on  whether  the  subject  was  at 
home  on  the  first  visit.)  Similarly  the  cost  of  a  deciding  service,  e.g.,  of 
the  decisions  to  re-order  for  inventories,  will  depend  on  how  sophisticated  is 
the  re-ordering  rule,  but  also  on  the  random  level  of  the  stock  at  hand,  and  of 
the  demand  predicted  by  that  survey  of  housewives.  Thus  the  cost  of  deciding 
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would  be,  in  general  a  random  variable, 

<cQ(y),  say. 

The  expression  for  the  expecteu  cost  K(k)  is  computed  at  the  bottom  of  Figure  1 
Again,  it  depends, of  course,  on  the  information  system  >a  chosen  by  the  manager; 
and  on  the  prior  probability  «  and  the  cost  functions  /c.,  k  which  are 

X  AG 

given  to  him. 

3.2.  When  costs  were  supposed,  temporarily,  to  be  independent  of  the 
information  system  (in  2.2),  the  manager  maximized  the  expected  gross  payoff, 
this  being  the  unique  criterion  of  choice.  In  this  case,  so-called  utility  was 
identical  with  gross  payoff.  Now  cost  has  entered  as  a  second  criterion  (so 
that  calling  y  the  "criterion  function"  is  not  a  good  terminology).  The 
utility  to  the  firm,  as  viewed  by  the  manager  (and  to  be  called  simply  "the 
manager's  utility")  is  defined  as  that  quantity  the  expected  value  of  which  he 
tries  to  maximize  by  his  choice  of  the  information  system  (which,  as  we  recall, 
includes  deciding  as  its  last  component).  The  utility  is  now  a  function  of  two 
numerical  criteria,  gross  payoff  and  cost,  increasing  in  the  former  and  decreas¬ 
ing  in  the  latter.  Three  cases  must  be  distinguished: 

1)  Utility  is  a  linear  function  (a  weighted  sum)  of  the  two,  appropriately 
scaled  criteria,  u(g,  k)  »  wg  -  k,  with  the  coefficient  (weight,  conversion 
rate)  w  known;  for  example,  both  gross  payoff  and  cost  are  measured  in  dollars 
so  that  w  *  1. 

2)  Utility  is  a  linear  function,  as  above,  but  w  is  unknown.  In  cases 
l)  and  2)  the  utility  is  said  to  be  decomposable  (into  the  component  criteria, 
with  respect  to  each  of  which  it  is  monotone);  if  this  is  not  the  case  we  have 
case  3):  u(g,  k)  cannot  be  represented  as  linear  in  known  transforms  of  g  and  k 

In  case  l)  one  computes  "net  expected  payoff"  as  the  difference  between 
expected  gross  payoff  G  and  the  expected  cost  K.  Or,  a  little  more  generally, 
one  first  multiplies  one  of  the  criteria  by  the  conversion  coefficient  w. 
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In  case  2),  vith  w  unknown,  it  is  still  true  that  E(u)  =  wE(g)  -  E(k) 

«  G  -  K.  We  say  that  a  choice  of  information  systems  that  results  in  G,  K 
dominates  another  system,  which  yields  G',  K',  say,  if  either 

G  >  G',  K  <  K'  or  G  >  G',  K  <  K*. 

It  is  clear  that  in  case  2)  (and,  of  course,  l)  as  well)  a  system  that 
dominates  another  system  with  respect  to  the  expected  value  of  the  two 
criteria  will  also  yield  a  higher  expected  utility.  One  can  construct,  from 
the  knowledge  of  the  expected  values  of  g  and  k  of  all  feasible  systems, 
the  so-called  efficient  set  consisting  of  all  those  feasible  systems  that  are 
not  dominated  by  some  feasible  system.  All  the  optimal  (but  possibly  also  some 
non-optimal)  systems  will  be  contained,  in  the  efficient  set.  This  reduction 
of  the  feasible  to  the  efficient  set  is  important  in  practice.  It  permits  the 
manager  (or  his  superior,  the  board,  say),  to  narrow  down  the  choice  and  to 
"try  out"  various  values  of  w:  to  do  some  "soul- searching"  regarding  the 
conversion  rate  between  benefits  and  costs,  not  in  the  abstract  but  in  the 
light  of  concrete  possibilities. 

3)  In  case  3)>  however,  an  optimal  system  (i.e.,  one  with  maximum  expected 
utility)  may  have  lower  criterion  expectations  than,  and  thus  be  dominated  by, 
a  non-optimal  system.  (This  mathematical  result  is  due  to  the  fact  that 
exnectation  is  a  linear  operator.)  In  this  case,  our  Figures  1,  2,  3,  would 
have  to  be  redrawn.  The  criterion  function  would  yield  directly  the  utility 
u(g,  k),  with  the  cost  k,  as  one  of  its  inputs,  along  with  x  and  a. 

3.4.  In  the  present  three  figures,  the  circles  "GROSS  PAYOFF"  and  "COST" 
have  been  drawn  with  "auras"  to  indicate  their  dignity  as  criteria.  But  in  the 
case  of  non-decompo sable  utility  (case  3)>  there  would  be  only  one  criterion, 
"utility"  itself,  to  replace  "gross  payoff,"  and  provided  with  an  aura;  the 
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circle  "cost"  would  lose  its  aura,  and  have,  instead,  an  output  arrow  leading 
from  it  to  the  criterion  function. 

3«5«  In  most  of  the  practical  work,  of  engineers  as  well  as  statisticians 
and  economists,  non-decompo sable  utilities  are  assumed  away,  for  reasons  of 
simplicity.  For  this  assumption  permits  to  operate  with  expected  values  of  the 
individual  criterion  into  which  utility  is  decomposed;  and  this  remains  possible, 
whatever  the  relevant  probability  distributions.  At  some  later  stage,  however, 
it  may  become  possible  to  approach  more  general  cases. 

4.  NO  ROLE  FOR  ENTROPY  FORMULAS? 

4.1.  So  far,  we  have  neglected  communication.  Or,  equivalently,  we  have 
assumed  it  to  be  perfect.  That  is,  we  have  assumed,  in  effect,  a  one-to-one 
correspondence  between  the  data  (observations)  put  out  by  inquiring,  and  the 
inputs  of  deciding,  which  we  shall  later  call,  as  in  Figure  2.,  "messages 
decoded."  In  the  context  of  communication,  "entropy  formulas"  for  so-called 
"information  amount"  and  "capacity"  will  be  introduced.  In  the  context  of 
inquiring  and  deciding  these  formulas  do  not  seem  to  play  a  role,  in  spite  of 
numerous  writers  who  have  attempted  to  link  the  economics  of  inquiring  and 
deciding  with  certain  results  of  pure  communication  economics. 

Neither  the  expected  gross  payoff  nor  the  expected  cost  of  inquiring  and 
deciding  are  related  to  the  formulas  involving  logarithms  of  the  relevant 
probabilities,  as  do  the  entropy  formulas. 

4.2.  In  our  notation,  the  entropy  formulas  depend  only  on  the  probabilities 
and  X  .  In  Fig’jre  1  and  in  Section  2,  the  expected  gross  payoff  of  a 

system  depended,  in  addition,  on  the  gross  payoff  function  7  and  the  decision 
function  a.  To  be  sure,  the  payoff  of  inquiry  alone  can  be  evaluated  assuming 
that  the  appropriate  optimal  decision  rule  is  used.  We  obtain 
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max  G(X,  o;  n,  y)  *  G^(A),  say. 

This  quantity,  sometimes  called  "value  of  Inquiry,"  does  not  depend  on  a,  but 
still  depends  on  the  gross  payoff  function  y,  which  will  differ  from  one  user 
of  the  inquiry  service  to  another.  Yet  y  does  not  enter  the  entropy  formulas. 

4.3.  Suppose  the  chance  of  rain  a  year  from  now  is  50$.  Suppose  the 
chance  is  also  50$  that  the  stock  of  a  corporation  in  which  I  hold  all  my 
investments  will  become  worthless  a  year  from  now.  A  forecaster  whose  foresight 
I  absolutely  trust  offers  to  tell  me  whether  it  will  rain  or  to  tell  me  whether, 
if  I  am  not  careful,  I  shall  lose  my  fortune.  In  both  cases  he  will  charge 
$1,000,  arguing  that  the  amount  of  information  he  sells  is  the  same  in  both 
cases,  viz.,  exactly 


-(£  £  ‘  £  &  *  1  bit* 

Yet,  I  shall  not  be  indifferent  between  his  two  offers.  For  losing  my  property 
is  much  worse  than  getting  wet:  that  is,  I  do  take  account  of  the  payoff  func¬ 
tion,  when  choosing  between  inquiring  services  offered  to  me. 

4.4.  To  illustrate  the  behavior  of  cost  as  well  as  expected  payoff  of  an 
inquiry  as  a  function  of  the  matrix,  consider  the  "binary  symmetric"  case  with 
x  and  y  each  taking  Just  two  values,  labeled  1  and  2,  and  with 

Xn  s  p(y«l|x«l)  =  p(y-2|x«2)  s  Xg2  =  p,  say. 

Without  loss  of  generality,  let  p  be  not  less  than 

It  has  been  shown  that  the  value  of  inquiry  (defined,  as  we  have  seen, 
under  assumption  of  the  optimal  decision  rule),  G  (p),  say,  while  depending 
on  x  and  y,  is  non-decreasing  in  p,  regardless  of  n  and  y.  (This  is 
plausible  intuitively.  Remember  that  1-p  is,  in  the  statisticians'  language, 
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the  probability  of  error  of  either  kind.)  As  to  the  cost,  it  is  plausible 
to  let  it  be  linear,  increasing  in  the  size  n  of  a  sample.  Interpret  events 
and  data  as  follows: 

x  =  1  or  2  according  as  the  mean  of  a  normal  distribution  with 
unit-variance  is  +  .1  or  -.1; 

y  =  1  or  2  according  as  the  mean  of  a  sample  of  n  is  or  is  not  positive. 

To  achieve  a  binary  symmetric  inquiry  characterized  by  p,  the  sample  size  n 

-1  2  -1 

must  be  equal  to  100  •  (F  (p))  ,  where  F  is  the  inverse  of  the  cunulative 
normal  distribution  with  mean  0  and  unit  variance.  And  the  cost  of  inquiry 
would  be  linear  in  this  expression.  Again,  no  relation  between  an  entropy 
formula  and  either  the  value  or  the  cost  of  inquiry!  The  value  is  some  non¬ 
decreasing  function  of  p  depending  on  the  payoff  function,  while  the  "amount 
of  information"  does  not.  The  cost  is  a  certain  increasing  convex  function 
of  p,  again  not  related  to  the  "amount  of  information"  in  any  transparent 
way  [nor  to  the  "capacity"  of  the  matrix  A  which  is  1  +  p  log^  p 
+  (l-p)logg  (1-P)]. 

5.  PURE  COMMUNICATION 

5.1.  As  mentioned,  the  value  of  becoming  informed  about  future  rain  and 
about  future  loss  of  my  savings  is  not  the  same,  even  if  the  "amount  of 
information"  happened  to  be  the  same  in  both  cases.  Nor  is  there  any  reason 
to  suppose  that  the  cost  of  obtaining  the  correct  forecast  would  be  the  same. 

What  is  the  same  in  both  cases  is  neither  the  value  nor  the  cost  of  inquiry. 
Rather,  it  is  the  cost  of  transmitting  the  message.  In  both  cases,  exactly 
one  yes-  or  -no  symbol  (one  binary  digit)  needs  to  be  transmitted,  corresponding 
precisely  to  the  number  of  bits  characterizing  the  probabilities  (50-50)  of  the 
possible  messages.  And  there  is  presumably  a  close  relation  between  the  number 
of  bits  to  be  transmitted,  and  the  cost  of  conmunication.  To  transmit  100  binary 
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digits  through  the  same  wire  one  would  need  100  times  more  time-units;  or,  to 
use  the  same  time,  one  would  use  100  wires  simultaneously,  etc. 

5.2.  The  distinction  between  production  and  transportation  is  somewhat 
analogous.  A  gallon  of  whiskey  is  more  costly  to  produce,  and  is  more  enjoyable 
for  the  consuner,  than  a  gallon  of  gasoline.  But  when  it  comes  to  transportation 
costs  a  gallon  is  a  gallon.  It  is  quite  clear  that  the  originators  of  the 
logarithmic  formulas  of  "information  theory" --Hartley,  Shannon--were  fully 
aware  that  they  were  essentially  concerned  with  the  cost  of  communication,  not 
with  the  cost  or  value  of  inquiry.  But  later  writers,  impressed  by  the  additive 
properties  of  the  logarithmic  expressions,  hailed  them  as  a  "measure"  of  that 
elusive  entity,  information,  without  explaining  what  the  measurement  is  for. 

(One  recent  writer,  an  expert  in  the  theory  of  probability,  claimed  that  the 
measurement  permitted  to  "treat  information  like  money."  But  there  must  be 
some  economic  reason  why  we  don't  measure  money  by  the  square  feet  of  the  bills' 
surface I ) 

5.3.  Figure  1,  "inquiring  and  Deciding"  is  amplified  into  Figure  2, 
"Inquiring,  Communicating,  Deciding"  by  inserting,  between  Inquiring  and 
Deciding,  intermediate  services,  also  represented  by  boxes  (i.e.,  viewed  as 
transformers),  and  necessary  to  give  account  of  communication.  As  a  result,  the 
input  of  the  Deciding  box  is  not  identical  anymore  with  the  output  of  the 
Inquiring  box.  While  the  latter  is  (as  before)  "data,"  the  former  is  now 
"messages  decoded."  Data  are  transformed  into  messages  decoded  through  the 
operations  (services)  of  storing,  encoding,  transmitting,  and  decoding,  all 
preceded  by  storing  of  the  data, 

5.1*.  It  is  more  effective,  however,  to  first  present  the  problems  and 
some  results  of  communication  economics  (as  achieved  by  the  creators  of 
"information  theory")  by  considering  the  simplified  picture  given  in  Figure  3: 


1? 

"Communication  only."  It  is  obtained  from  Figure  2  by  making  the  following 
special  assumptions: 

(1)  "K  is  an  identity  matrix  and  /c^(x)  is  identically  zero;  no  distinc¬ 

tion,  therefore,  between  events  x  and  data  y. 

(2)  u  is  an  identity  matrix  and  <^(y)  is  identically  zero;  no  distinc¬ 

tion  therefore  between  messages  to  send,  m,  data  y,  and  (by  (l))  events  x; 
that  is,  "messages  to  send"  (to  be  denoted  by  x)  enter  the  criterion  function 
as  an  input. 

(3)  ct  is  an  identity  matrix  and  kq( m' )  is  identically  zero.  Thus 

action,  that  is,  the  other  input  of  the  criterion  function,  is  identical  with 

message  decoded.  Deciding  is  decoding. 

(4)  The  criterion  (or  gross  payoff  or  benefit)  function  has  the  following 

form: 

1  if  x  =  a 

r(x,  a)  .  &xa  = 

0  if  x  ^  a. 

That  is,  any  error  is  as  important  as  any  other.  (However,  some  later  writings, 
following  one  by  Shannon  in  195y»  drop  this  assumption  and  deal  with  a  general 
"distortion  function."  I  owe  this  reference  to  Professor  Jacobson  of  the 
University  of  California  at  San  Diego.) 

5.5-  The  encoding  function  e(x)  =  v  transforms  the  (possibly  English) 
message  x  into  a  "word"  v  which  is  a  sequence  of  symbols  (e.g.,  binary 
digits)  v^Vg* • »v  ,  say.  Transmitting,  symbol  by  symbol,  is  done  using  a 
"channel"  characterized  by  (a)  a  Markov  matrix  t  with  as  many  rows  as  there 
are  possible  input  symbols,  and  as  many  columns  as  there  are  possible  output 
symbols;  and  (b)  the  speed  of  the  channel,  in  symbols  per  time-unit.  The  output 
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word  put  out  by  the  channel  is,  then,  v*  =  v^v^.  *  *vn»  and  livelihood 
p(Yl|Vi  )=*r  ,  (independent  of  i) 

I 

is  an  element  of  the  channel  matrix  t.  We  can  thus  write  =  t^),  with 
t  a  stochastic  transformation  (as  was  explained  in  Section  2  for  the  analogous 
case  of  A).  Finally,  the  decoding  operation  d  transforms  the  word  v', 
a  sequence  of  symbols  put  out  by  the  channel,  into  a  message  in  the  original 
language.  This  decoded  message,  a,  together  vdth  the  original  message  sent, 
x,  are  the  inputs  of  the  criterion  function  which,  in  most  of  the  literature, 
i3  the  Kronecker  delta,  as  already  mentioned.  We  have  then, 

a  =  dTe(x), 

and  the  gross  expected  payoff  is 

G  *=  E  £  £  £  VxvTw'dv,a6xa  =  1  ‘  Probability  of  error; 
x  v  v'a - - 

(we  have  underlined  the  Latin  letters  to  convey  that  blocks  of  messages  are 
transmitted^ 

5.6.  On  the  other  hand,  there  are  costs  associated  with  each  of  the 
transformers  (services)  involved.  Encoding  and  decoding  costs  the  more  time 
or  effort  the  larger  the  length  n  of  the  word.  And  the  channel  costs  the 
more,  the  more  reliable,  in  some  sense,  is  its  matrix  t,  and  the  greater  its 
speed.  Wo  conversion  rate  is  known  that  would  make  it  possible  to  express  the 
probability  of  error,  or  its  complement,  in  the  same  units  as  the  cost- 
determining  properties  of  the  code  (e,  d)  and  the  channel. 

5.7.  Instead,  as  Wolfowitz  has  pointed  out,  the  problem  is  stated  as  one 
of  determining  the  set  of  non-dominated  combinations  of  G  and  n  (and  If, 
the  nunber  of  possible  words):  the  efficient  set  (see  case  2)  of  Section  3). 
Fundamental  is  the  theorem  due  to  Shannon  which  states  that,  provided  the 
"uncertainty  at  source"  is  less  then  the  "capacity  of  channel,"  a  code  (e,  d) 
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exists  that  depresses  the  probability  of  error  as  close  to  zero  as  desired; 
where,  in  our  notation, 

uncertainty  at  source  =  -  Z  n  log  n  times  the  speed  of  inflow  of 

x 

messages, 

capacity  of  channel  =  max  l(p,  t)  times  the  speed  of  transmission, 

P 

where  the  maximization  is  over  the  set  of  all  possible  distributions  p  over 

the  alphabet  of  symbol  inputs,  and  l(p,  t),  the  "mutual  information"  of  the 
» 

symbols  vi  and  v  depends  only  on  the  probabilities  py,  r  ,  and  their 
logarithms.  To  achieve  a  small  probability  of  error  with  a  low-capacity  channel, 
very  long  code  words  may  be  needed.  If  our  problem  were  not  a  pure  communica¬ 
tion  problem,  and  the  waiting  for  the  completion  of  a  coded  message  would  imply 
waiting  for  a  long  string  of  events  to  happen,  the  decision  would  become  obso¬ 
lete.  The  existence  of  almost  perfect  codes  would  be  of  no  practical  interest. 

In  the  pure  communication  situation,  however,  the  messages  (not  the  actual 
events)  do  flow  in  very  rapidly.  To  illustrate:  the  economics  of  pure  commu¬ 
nication  is  not  concerned  with  following  the  sequence  of  events  "stock  price 
on  Monday,  stock  price  on  Tuesday,...,"  possibly  waiting  several  days  to  complete 
an  efficiently  coded  word;  rather,  it  is  concerned  with  transmitting  the  "stored" 

record  of  a  long  scries  of  such  events,  or  an  event  rich  m  dimensions  (e.g.,  the 
daily  Stock  Market  list  of  prices).  The  asymptotic,  long-sequence  properties  of 
codes  and  channels,  proved  in  information  theory  have  therefore  little  relevance, 
for  example,  to  the  economics  of  seauontial  decision-making  (dynamic  programming). 
Capacity  as  defined  in  the  theory  of  communication  can  be  computed  for  any 
Markov  matrix;  but  I  cannot  see  thi.t  it  can  be  applied  usefully  outside  of  the 
context  of  coding  and  transmitting  of  pre-stored  records,  except,  of  course,  in 
fields  such  as  acoustics  where  the  succession  of  "events"  (wave-patterns)  is 
indeed  very  rapid  relative  to  the  needed  succession  of  decisions. 


6.  INQUIRING,  STORING,  COMMUNICATING,  UECIDING 

6.1.  We  now  remove  the  assumptions  (l)-(l)  made  in  the  previous  section, 
where  the  pure  coramuni cation  problem  was  defined.  That  is,  we  shall  consider 
now  the  sequence  of  services  presented  in  Figure  2.  The  gross  payoff  to  the 
manager  depends,  not  on  the  messages  received  compared  with  the  messages  sent, 
but,  rather,  on  the  events  of  the  external  world,  combined  with  his  actions; 
and  his  actions  do  not  consist  in  merely  decoding  (translating  from  the 
language  of  the  channel  into  ordinary  language).  Note  in  particular  the 
transformer  "storing"  (transforming  data  into  messages,  with  a  time  delay 
necessary  to  accisnulate  a  "block"  of  data  into  an  efficiently  encodable 
message).  This  box  did  not  appear  in  Figures  1  and  3,  where  communication 
was,  in  effect,  separated  from  the  services  of  inquiry  about  events,  and  of 
decision  about  actions.  Without  the  storing  of  data  the  study  of  coding  and 
transmitting  long  sequences  of  messages,  which  is  the  core  of  the  theory  of 
communication,  becomes  irrelevant  to  economics. 

6.2.  The  generalization  of  the  expressions  for  the  expected  payoff  and 
expected  cost  that  were  given  in  Sections  2  and  3  and  at  the  bottom  of  Figure  1 
is  straightforward.  The  sequence  of  services,  >o,  becomes  now  ^perda.  (As 
before,  we  may  consider  all  these  services  to  be,  in  general,  noisy)  iC.not,  the 
"degeneration"  of  a  Markov  matrix  into  one  consisting  of  l's  and  0’s  only 
is  easily  handledl  We  have  remarked  before  that  a  somewhat  "noisy"  decider 

may  be  cheap.  The  coding  operations  (e,  d)  are  often  conceived  as  rigid  rules 
but  we  should  also  think  of  the  complex  cases  where  coding  (sometimes  called 
programming  in  this  context)  at  the  present  state  of  technology  and  of  human 
skills,  must  be  performed  as  an  "art,"  subject  to  many  trials  and  errors. 

6.3.  Although,  in  Figure  2,  a  cost  function  (kappa  with  an  appropriate 
subscript)  is  assigned  to  each  service,  the  accounting  practices  may  or  may  not 
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have  caught  up  with  this  task.  We  have  noted,  in  particular,  in  Section  5, 
that  the  efficient  sets  that  the  communication  theory  strives  to  con-truct 
have  dimensions  such  as  "length  of  code  word"  (or  the  expectation  of  this 
length),  rather  than  "cost(or  expected  cost  of  a  code  word."  In  Section  3, 
we  discussed  an  efficient  set  of  only  two  dimensions:  "expected  gross  payoff," 
(or  "benefit,")  and  "expected  cost";  the  conversion  rate  between  the  two  being 
possibly  unknown.  Perhaps  further  dimensions  must  be  added  pending  further 
research  into  the  monetary  cost  of  coding  operations  and  of  prices  or  rentals 
cf  transmitting  channels. 

6J*.  A  terminological  remark  is  in  order,  and  should  have  been  made 
earlier.  The  manager  decides  about  hiring,  among  other  things,  a  "deciding 
service,"  to  be  performed  by  a  human  or  possibly  a  machine.  We  have  given  the 
example  of  hiring  an  employee  in  charge  of  deciding  about  re-ordering  for 
inventories.  He  must  be  distinguished  from  the  manager,  who  makes  the  "meta¬ 
decision"  (also  called  "organizational  decision")  as  to  which  information 
services  or  instruments  to  use,  including  the  services  and  instruments  for 
"lower- level"  decisions.  (It  is  easy  to  conceive  and  philosophize  about  the 
infinite  recourse  of  meta-meta-deciders,  etc.,  but  we  shall  not  do  it  here.) 

6.5*  It  is  essential  to  remember  that  the  various  services  must,  in 
principle,  be  chosen  .jointly.  The  choice  of  a  channel  and  a  code  are  inter¬ 
dependent,  and  both  are  also  interdependent  with  the  decision  (a  "meta-decision" 
in  the  sense  just  defined)  as  to  how  detailed  or  coarse,  or  how  noisy,  the 
inquiry  operation  should  be,  and  how  detailed  a  message  could  be  typically 
handled  by  the  deciding  employee.  The  situation  is  analogous  to  that  of  a 
manufacturer  who  must  decide  whether  the  fuel  for  his  operations  should  be 
brought  in  by  rail  or  by  road;  this  decision  must,  of  course,  be  made 
simultaneously  with  the  decision  whether  to  use  coal  or  oil. 
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6.6.  To  be  sure,  it  is  simpler/ to  neglect  the  interdependence  between 
the  services  constituting  an  information  system.  As  a  first  approximation 
their  separability  is  assumed,  and  the  resulting  loss  in  utility  (the  "sub- 
optimization")  is  accepted.  But  progress  can  be  expected  towards  improving 
the  system  by  taking  account  of  interdependencies  between  its  components.  This 
is  quite  similar  to  the  progress  from  a  primitive  factory  design  to  a  modern 
layout.  (Incidentally,  the  assumption  of  a  "decomposable"  utility,  linear  in 
the  various  criteria,  such  as  cost  and  benefit,  is  a  similar  simplification, 
possibly  to  be  overcome  in  due  course.) 

7.  DYNAMIC  AND  STOCHASTIC  EXTENSIONS 

7.1.  So  far,  the  symbols  denoting  our  variables  (x,  y,  m,  v,  v',  m',  a, 
g,  k)  have  not  been  dated,  although  verbal  statements  were  made  as  to  the 
time  delay  involved  in  storing;  and  of  the  various  services  being  more  or  less 
costly  in  terms  of  time  needed  for  a  performance.  Account  of  the  processes  in 
time  is  needed  for  a  proper  description  of  the  system  and  the  evaluation  of  the 
benefits  and  costs  (which  must  be  "discounted"  for  time  in  any  calculation  of 
utility) — even  in  the  case  when  a  single  decision  is  to  be  taken  once  and  for 
all,  as  in  the  case  of  a  simple  construction  or  acquisition  project.  More 
usually,  the  benefits  and  costs  depend  on  a  sequence  of  decisions,  and  a  se¬ 
quence  of  events.  A  decision  to  be  taken  in  December  will  make  use  of  messages 
about  the  events  of  the  earlier  months  of  the  year,  and  also  take  account  of 
the  impact  of  previous  decisions. 

7.?.  Accordingly,  one  might  visualize  one  sheet  such  as  Figure  3>  for 
each  consecutive  date,  with  in-and-output  arrows  crossing  the  three-dimensional 
stack  of  such  sheets.  Alternatively,  an  elaborate  network  of  dated  feedback 
arrows  can  be  used  on  a  sheet.  (I  believe  specialists  in  information  storing 
and  retrieval  are  working  on  such  problems.) 


7.3»  In  our  earlier  presentation  (as  in  Section  2.5>  for  example),  the 
probability  distribution  of  events  x  was  regarded  as  non-controlled  by  the 
manager,  as  one  of  the  givens  in  his  problem.  One  approach  used  in  the  dynamic 
programming  is  to  conceive  of  a  sequence  of  probability  distributions,  condi¬ 
tional  upon  the  sequence  of  decisions;  the  initial,  or  prior,  conditional 
distribution  is  followed  by  a  sequence  of  posterior  conditional  distributions, 
revised  on  the  basis  of  the  accumulated  sequence  of  data.  Thus  only  the  prior 
distribution  is  "given." 

7.4.  In  addition  to,  or  independently,  of,  this  "dynamic  extension"  of 
the  problem,  another  extension,  or  generalization,  is  often  considered, 
especially  by  statisticians,  and  may  be  called  "stochastic."  To  illustrate, 
let  the  action  variable  have  two  values:  to  operate  or  not  to  operate  on  the 
patient.  The  events  may  be  "he  has  cancer,"  and  "not  so,"  with  probabilities 
rr^  and  «2,  respectively.  However,  if  he  has  cancer,  the  number  of  years 
left  to  him  (the  "benefit")  if  action  "operate"  is  taken,  is  itself  a  random 
variable.  And  the  appropriate  way  to  characterize  the  event  "cancer,"  is  to 
give  a  probability  distribution  that  will  be  transformed,  if  operation  is 
performed,  into  a  certain  distribution,  and  if  the  operation  is  not  performed, 
into  another  distribution  of  the  number  of  years  left  to  the  patient;  and 
similarly,  the  event  "no  cancer"  is  best  represented  by  a  probability  distribu¬ 
tion.  Thus  the  variable  x,  which  influences  the  "data"  x,  must  be  conceived 
as  a  "statistical  hypothesis,"  a  probability  distribution  (whether  or  not 
conveniently  represented  by  some  numerical  parameters).  Accordingly  the 
benefit  and  the  cost  are  random  variables  (whose  expectations  must  be  evalu¬ 
ated),  not  only  because  the  "event"  x  is  subject  to  a  probability  distribution 
*,  and  because  inquiry  is,  and  possibly  the  other  transformers  are,  noisy,  but 


19 


also  because,  in  general,  x  itself  is  a  probability  distribution  (possibly 
represented  by  one  or  more  statistical  parameters,  whose  "prior"  distribution 
is  given  by  n). 

8.  THE  MARKET  IN  INFORMATION  SERVICES 

8.1.  We  have  assumed  the  cost  functions  of  the  different  services  to  be 
given.  Thus  for  any  given  likelihood  matrix  X,  characterizing  an  inquiring 
service,  the  price  x^(x)  Is  known  to  the  manager.  So  is  the  price 
of  a  given  channel  matrix  t.  (Two  channels  with  the  same  capacity  need  not 
have  the  same  price;  for  long  sequences  of  messages,  they  would  contribute 
equally  to  the  expected  benefit;  and  the  manager  would  prefer  the  cheaper  one.) 
The  prices  can  indeed  be  considered  given  to  the  manager  under  the  regime  of 
competition  of  numerous  firms,  facing  numerous  suppliers,  and  with  no  coalitions 
of  suppliers  of  the  information  services  (thus,  no  unions  of  communication 
workers  or  computer  prograzmners).  If  this  is  not  so,  then  it  is  not  true  that 
the  cost  functions  are  given  to  the  manager;  he  can  influence  them,  depending 
on  his  relative  bargaining  powers.  The  givens  of  his  problem  are,  then,  these 
powers  (properly  defined),  and  not  the  prices  themselves. 

8.?.  Whether  in  a  competitive  market  or  not,  the  price 
of  an  information  service  depends  on  the  way  in  which  the  total  demand  of 
all  managers  for  a  given  service  depends  on  its  price,  and  the  way  in  which 
the  total  supply  of  this  service  depends  on  this  price. 

8.3-  Our  previous  discussion  explains  how  the  manager  should  determine 
his  demand  for  various  information  services  if  he  has  a  clear  picture  of  what 
benefits  he  wants  to  achieve.  If  all  managers'  ideas  of  their  desired  benefits 
(and  also  their  "prior"  ideas  about  the  external  world,  the  distribution  n) 
were  known,  and  if  they  followed  the  advice  of  a  management  scientist,  the 
total  demand  for  the  information  services,  at  any  given  set  of  prices,  could  be 
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evaluated.  As  it  is,  one  has  simply  to  take  some  existing  state  of  demand  as 
a  fact,  a  subject  of  day-to-day  market  research. 

Q.h.  As  to  the  supply  of  information  services,  it  depends,  of  course,  on 
the  state  of  technology  (for  machines)  and  of  education  and  training  (for  men). 
A  comparison  between  machines  and  men,  and  estimation  of  future  trends  in  their 
comparative  performance,  is  fascinating  and  is  occupying  many  minds.  As  I 
understand  it,  compared  with  the  present  machines,  present  man  is  a  very 
inferior  transmission  channel  and  a  very  poor  storer  of  information.  On  the 
other  hand,  he  seems,  so  far,  to  be  unexcelled  in  many  forms  of  coding, 
especially  for  transmitting  to  other  men  (e.g.,  in  efficiencly  adjusting  the 
language  to  the  particularities  of  the  receiver),  but  apparently  also  to 
machines  (thus,  "programming  into  computers  is  still  an  art  not  a  science" — 
otherwise  it  would  be  all  done  by  machines!)  Moreover,  current  studies  in  the 
psychology  of  language  and  of  information  tend  to  show  that,  for  example,  two 
"inquiring  servicer"  that  are  equivalent  in  terms  of  our  mathematical  defini¬ 
tions  (for  example,  the  readings  on  two  instruments  with  equally  fine  scales^ 
may  be  different  in  an  economically  relevant  sense  (vertical  scales  are  read 
more  slowly  than  horizontal  ones).  Also,  a  finer  partition  of  the  set  of 
events  (e.g.,  identifying  a  two-dimensional  phenomenon)  seems  sometimes  to 
require  less  effort  than  a  coarser  partition  (e.g.,  identifying  only  one 
dimension),  contrary  to  the  guesses  that  would  lead  to  an  easy  postulation 
of  an  "economic  equilibrium." 

8.5.  It  has  been  estimated  that  information  services  as  we  have  defined 
them  constitute  Uo'S  or  more  of  the  Gross  National  Product  of  this  country. 

Hence  the  public  interest  in  having  both  the  technology  and  the  skills  in  these 
fields  improved.  The  purpose  of  the  present  paper  is  merely  to  contribute  to 
a  clearer  under standing  of  the  relevant  concepts  from  the  point  of  view  of  a 
"manager"  (an  "organizer,"  a  "meta-decider.") 
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